Two Statistical Parsing Models Applied To The Chinese Treebank
نویسندگان
چکیده
This paper presents the rst-ever results of applying statistical parsing models to the newly-available Chinese Treebank. We have employed two models, one extracted and adapted from BBN's SIFT System (Miller et al., 1998) and a TAGbased parsing model, adapted from (Chiang, 2000). On sentences with 40 words, the former model performs at 69% precision, 75% recall, and the latter at 77% precision and 78% recall.
منابع مشابه
تصحیح خودکار خطا در درخت بانک نحوی با استفاده از یادگیری ماشینی انتقال محور
The Treebank is one of the most useful resources for supervised or semi-supervised learning in many NLP tasks such as speech recognition, spoken language systems, parsing and machine translation. Treebank can be developded in different ways that could be, generally, categorized in manually and statistical approaches. While the resulted Treebank in each of these methods has the annotation error,...
متن کاملAdapting Multilingual Parsing Models to Sinica Treebank
This paper presents our work for participation in the 2012 CIPS-SIGHAN shared task of Traditional Chinese Parsing. We have adopted two multilingual parsing models – a factored model (Stanford Parser) and an unlexicalized model (Berkeley Parser) for parsing the Sinica Treebank. This paper also proposes a new Chinese unknown word model and integrates it into the Berkeley Parser. Our experiment gi...
متن کاملApplying Conditional Random Fields to Chinese Shallow Parsing
Chinese shallow parsing is a difficult, important and widely-studied sequence modeling problem. CRFs are new discriminative sequential models which may incorporate many rich features. This paper shows how conditional random fields (CRFs) can be efficiently applied to Chinese shallow parsing. We employ using CRFs and HMMs on a same data set. Our results confirm that CRFs improve the performance ...
متن کاملExploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs, we design quasisynchronous grammar features to augment the baseline parsing models. Our approach ca...
متن کاملChasing the ghost: recovering empty categories in the Chinese Treebank
Empty categories represent an important source of information in syntactic parses annotated in the generative linguistic tradition, but empty category recovery has only started to receive serious attention until very recently, after substantial progress in statistical parsing. This paper describes a unified framework in recovering empty categories in the Chinese Treebank. Our results show that ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000